Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 77.049
Filtrar
1.
Bull Math Biol ; 86(6): 63, 2024 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-38664322

RESUMO

In this study, we present a mathematical model for plasmid spread in a growing biofilm, formulated as a nonlocal system of partial differential equations in a 1-D free boundary domain. Plasmids are mobile genetic elements able to transfer to different phylotypes, posing a global health problem when they carry antibiotic resistance factors. We model gene transfer regulation influenced by nearby potential receptors to account for recipient-sensing. We also introduce a promotion function to account for trace metal effects on conjugation, based on literature data. The model qualitatively matches experimental results, showing that contaminants like toxic metals and antibiotics promote plasmid persistence by favoring plasmid carriers and stimulating conjugation. Even at higher contaminant concentrations inhibiting conjugation, plasmid spread persists by strongly inhibiting plasmid-free cells. The model also replicates higher plasmid density in biofilm's most active regions.


Assuntos
Biofilmes , Transferência Genética Horizontal , Conceitos Matemáticos , Modelos Biológicos , Modelos Genéticos , Plasmídeos , Biofilmes/crescimento & desenvolvimento , Plasmídeos/genética , Conjugação Genética , Antibacterianos/farmacologia
2.
Genome Med ; 16(1): 62, 2024 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-38664839

RESUMO

The "missing" heritability of complex traits may be partly explained by genetic variants interacting with other genes or environments that are difficult to specify, observe, and detect. We propose a new kernel-based method called Latent Interaction Testing (LIT) to screen for genetic interactions that leverages pleiotropy from multiple related traits without requiring the interacting variable to be specified or observed. Using simulated data, we demonstrate that LIT increases power to detect latent genetic interactions compared to univariate methods. We then apply LIT to obesity-related traits in the UK Biobank and detect variants with interactive effects near known obesity-related genes (URL: https://CRAN.R-project.org/package=lit ).


Assuntos
Estudo de Associação Genômica Ampla , Obesidade , Humanos , Obesidade/genética , Epistasia Genética , Característica Quantitativa Herdável , Locos de Características Quantitativas , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Predisposição Genética para Doença , Pleiotropia Genética , Fenótipo , Herança Multifatorial
3.
BMC Genomics ; 25(1): 386, 2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38641604

RESUMO

BACKGROUND: The growth and development of organism were dependent on the effect of genetic, environment, and their interaction. In recent decades, lots of candidate additive genetic markers and genes had been detected by using genome-widely association study (GWAS). However, restricted to computing power and practical tool, the interactive effect of markers and genes were not revealed clearly. And utilization of these interactive markers is difficult in the breeding and prediction, such as genome selection (GS). RESULTS: Through the Power-FDR curve, the GbyE algorithm can detect more significant genetic loci at different levels of genetic correlation and heritability, especially at low heritability levels. The additive effect of GbyE exhibits high significance on certain chromosomes, while the interactive effect detects more significant sites on other chromosomes, which were not detected in the first two parts. In prediction accuracy testing, in most cases of heritability and genetic correlation, the majority of prediction accuracy of GbyE is significantly higher than that of the mean method, regardless of whether the rrBLUP model or BGLR model is used for statistics. The GbyE algorithm improves the prediction accuracy of the three Bayesian models BRR, BayesA, and BayesLASSO using information from genetic by environmental interaction (G × E) and increases the prediction accuracy by 9.4%, 9.1%, and 11%, respectively, relative to the Mean value method. The GbyE algorithm is significantly superior to the mean method in the absence of a single environment, regardless of the combination of heritability and genetic correlation, especially in the case of high genetic correlation and heritability. CONCLUSIONS: Therefore, this study constructed a new genotype design model program (GbyE) for GWAS and GS using Kronecker product. which was able to clearly estimate the additive and interactive effects separately. The results showed that GbyE can provide higher statistical power for the GWAS and more prediction accuracy of the GS models. In addition, GbyE gives varying degrees of improvement of prediction accuracy in three Bayesian models (BRR, BayesA, and BayesCpi). Whatever the phenotype were missed in the single environment or multiple environments, the GbyE also makes better prediction for inference population set. This study helps us understand the interactive relationship between genomic and environment in the complex traits. The GbyE source code is available at the GitHub website ( https://github.com/liu-xinrui/GbyE ).


Assuntos
Locos de Características Quantitativas , Seleção Genética , Teorema de Bayes , Modelos Genéticos , Fenótipo , Genótipo , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único
4.
J Math Biol ; 88(5): 58, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38584237

RESUMO

It was recently shown that a large class of phylogenetic networks, the 'labellable' networks, is in bijection with the set of 'expanding' covers of finite sets. In this paper, we show how several prominent classes of phylogenetic networks can be characterised purely in terms of properties of their associated covers. These classes include the tree-based, tree-child, orchard, tree-sibling, and normal networks. In the opposite direction, we give an example of how a restriction on the set of expanding covers can define a new class of networks, which we call 'spinal' phylogenetic networks.


Assuntos
Algoritmos , Modelos Genéticos , Humanos , Filogenia
5.
PLoS Comput Biol ; 20(4): e1012027, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38598558

RESUMO

Although the length and constituting sequences for pericentromeric repeats are highly variable across eukaryotes, the presence of multiple pericentromeric repeats is one of the conserved features of the eukaryotic chromosomes. Pericentromeric heterochromatin is often misregulated in human diseases, with the expansion of pericentromeric repeats in human solid cancers. In this article, we have developed a mathematical model of the RNAi-dependent methylation of H3K9 in the pericentromeric region of fission yeast. Our model, which takes copy number as an explicit parameter, predicts that the pericentromere is silenced only if there are many copies of repeats. It becomes bistable or desilenced if the copy number of repeats is reduced. This suggests that the copy number of pericentromeric repeats alone can determine the fate of heterochromatin silencing in fission yeast. Through sensitivity analysis, we identified parameters that favor bistability and desilencing. Stochastic simulation shows that faster cell division and noise favor the desilenced state. These results show the unexpected role of pericentromeric repeat copy number in gene silencing and provide a quantitative basis for how the copy number allows or protects repetitive and unique parts of the genome from heterochromatin silencing, respectively.


Assuntos
Centrômero , Heterocromatina , Schizosaccharomyces , Heterocromatina/metabolismo , Heterocromatina/genética , Schizosaccharomyces/genética , Schizosaccharomyces/metabolismo , Centrômero/metabolismo , Centrômero/genética , Modelos Genéticos , Biologia Computacional , Inativação Gênica , Sequências Repetitivas de Ácido Nucleico/genética , Humanos , Histonas/metabolismo , Histonas/genética
6.
Genome Biol Evol ; 16(4)2024 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-38608148

RESUMO

Nucleotide diversity at a site is influenced by the relative strengths of neutral and selective population genetic processes. Therefore, attempts to estimate Effective population size based on the diversity of synonymous sites demand a better understanding of their selective constraints. The nucleotide diversity of a gene was previously found to correlate with its length. In this work, I measure nucleotide diversity at synonymous sites and uncover a pattern of low diversity towards the translation initiation site of a gene. The degree of reduction in diversity at the translation initiation site and the length of this region of reduced diversity can be quantified as "Effect Size" and "Effect Length" respectively, using parameters of an asymptotic regression model. Estimates of Effect Length across bacteria covaried with recombination rates as well as with a multitude of translation-associated traits such as the avoidance of mRNA secondary structure around translation initiation site, the number of rRNAs, and relative codon usage of ribosomal genes. Evolutionary simulations under purifying selection reproduce the observed patterns and diversity-length correlation and highlight that selective constraints on the 5'-region of a gene may be more extensive than previously believed. These results have implications for the estimation of effective population size, and relative mutation rates, and for genome scans of genes under positive selection based on "silent-site" diversity.


Assuntos
Evolução Molecular , Variação Genética , Seleção Genética , Modelos Genéticos , Nucleotídeos/genética , Uso do Códon , Iniciação Traducional da Cadeia Peptídica
7.
Methods Mol Biol ; 2757: 461-490, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38668979

RESUMO

Understanding gene evolution across genomes and organisms, including ctenophores, can provide unexpected biological insights. It enables powerful integrative approaches that leverage sequence diversity to advance biomedicine. Sequencing and bioinformatic tools can be inexpensive and user-friendly, but numerous options and coding can intimidate new users. Distinct challenges exist in working with data from diverse species but may go unrecognized by researchers accustomed to gold-standard genomes. Here, we provide a high-level workflow and detailed pipeline to enable animal collection, single-molecule sequencing, and phylogenomic analysis of gene and species evolution. As a demonstration, we focus on (1) PacBio RNA-seq of the genome-sequenced ctenophore Mnemiopsis leidyi, (2) diversity and evolution of the mechanosensitive ion channel Piezo in genetic models and basal-branching animals, and (3) associated challenges and solutions to working with diverse species and genomes, including gene model updating and repair using single-molecule RNA-seq. We provide a Python Jupyter Notebook version of our pipeline (GitHub Repository: Ctenophore-Ocean-To-Tree-2023 https://github.com/000generic/Ctenophore-Ocean-To-Tree-2023 ) that can be run for free in the Google Colab cloud to replicate our findings or modified for specific or greater use. Our protocol enables users to design new sequencing projects in ctenophores, marine invertebrates, or other novel organisms. It provides a simple, comprehensive platform that can ease new user entry into running their evolutionary sequence analyses.


Assuntos
Ctenóforos , Evolução Molecular , Filogenia , RNA-Seq , Animais , RNA-Seq/métodos , Ctenóforos/genética , Ctenóforos/classificação , Genoma/genética , Biologia Computacional/métodos , Software , Genômica/métodos , Modelos Genéticos
8.
J Anim Sci ; 1022024 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-38576313

RESUMO

Accurate genetic parameters are crucial for predicting breeding values and selection responses in breeding programs. Genetic parameters change with selection, reducing additive genetic variance and changing genetic correlations. This study investigates the dynamic changes in genetic parameters for residual feed intake (RFI), gain (GAIN), breast percentage (BP), and femoral head necrosis (FHN) in a broiler population that undergoes selection, both with and without the use of genomic information. Changes in single nucleotide polymorphism (SNP) effects were also investigated when including genomic information. The dataset containing 200,093 phenotypes for RFI, 42,895 for BP, 203,060 for GAIN, and 63,349 for FHN was obtained from 55 mating groups. The pedigree included 1,252,619 purebred broilers, of which 154,318 were genotyped with a 60K Illumina Chicken SNP BeadChip. A Bayesian approach within the GIBBSF90 + software was applied to estimate the genetic parameters for single-, two-, and four-trait models with sliding time intervals. For all models, we used genomic-based (GEN) and pedigree-based approaches (PED), meaning with or without genotypes. For GEN (PED), heritability varied from 0.19 to 0.2 (0.31 to 0.21) for RFI, 0.18 to 0.11 (0.25 to 0.14) for GAIN, 0.45 to 0.38 (0.61 to 0.47) for BP, and 0.35 to 0.24 (0.53 to 0.28) for FHN, across the intervals. Changes in genetic correlations estimated by GEN (PED) were 0.32 to 0.33 (0.12 to 0.25) for RFI-GAIN, -0.04 to -0.27 (-0.18 to -0.27) for RFI-BP, -0.04 to -0.07 (-0.02 to -0.08) for RFI-FHN, -0.04 to 0.04 (0.06 to 0.2) for GAIN-BP, -0.17 to -0.06 (-0.02 to -0.01) for GAIN-FHN, and 0.02 to 0.07 (0.06 to 0.07) for BP-FHN. Heritabilities tended to decrease over time while genetic correlations showed both increases and decreases depending on the traits. Similar to heritabilities, correlations between SNP effects declined from 0.78 to 0.2 for RFI, 0.8 to 0.2 for GAIN, 0.73 to 0.16 for BP, and 0.71 to 0.14 for FHN over the eight intervals with genomic information, suggesting potential epistatic interactions affecting genetic trait architecture. Given rapid genetic architecture changes and differing estimates between genomic and pedigree-based approaches, using more recent data and genomic information to estimate variance components is recommended for populations undergoing genomic selection to avoid potential biases in genetic parameters.


Genetic parameters are used to predict breeding values for individuals in breeding programs undergoing selection. However, inaccurate genetic parameters can cause breeding values to be biased, and genetic parameters can change over time due to multiple factors. This study aimed to investigate how genetic parameters changed over time in a broiler population using time intervals and observing the behavior of single nucleotide polymorphism (SNP) effects. We studied four traits related to production and disorders while also studying the impact of using genomic information on the estimates. Genetic variances showed an overall decreasing trend, whereas residual variances increased during each interval, resulting in decreasing heritability estimates. Genetic correlations between traits varied but with no major changes over time. Estimates tended to be lower when genomic information was included in the analysis. SNP effects showed changes over time, indicating changes to the genetic background of this population. Using outdated variance components in a population under selection may not represent the current population. Furthermore, when genomic selection is practiced, accounting for this information while estimating variance components is important to avoid biases.


Assuntos
Galinhas , Polimorfismo de Nucleotídeo Único , Seleção Genética , Animais , Galinhas/genética , Masculino , Feminino , Cruzamento , Linhagem , Genótipo , Doenças das Aves Domésticas/genética , Genômica , Fenótipo , Teorema de Bayes , Modelos Genéticos
9.
Theor Appl Genet ; 137(5): 108, 2024 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-38637355

RESUMO

KEY MESSAGE: The integration of genomic prediction with crop growth models enabled the estimation of missing environmental variables which improved the prediction accuracy of grain yield. Since the invention of whole-genome prediction (WGP) more than two decades ago, breeding programmes have established extensive reference populations that are cultivated under diverse environmental conditions. The introduction of the CGM-WGP model, which integrates crop growth models (CGM) with WGP, has expanded the applications of WGP to the prediction of unphenotyped traits in untested environments, including future climates. However, CGMs require multiple seasonal environmental records, unlike WGP, which makes CGM-WGP less accurate when applied to historical reference populations that lack crucial environmental inputs. Here, we investigated the ability of CGM-WGP to approximate missing environmental variables to improve prediction accuracy. Two environmental variables in a wheat CGM, initial soil water content (InitlSoilWCont) and initial nitrate profile, were sampled from different normal distributions separately or jointly in each iteration within the CGM-WGP algorithm. Our results showed that sampling InitlSoilWCont alone gave the best results and improved the prediction accuracy of grain number by 0.07, yield by 0.06 and protein content by 0.03. When using the sampled InitlSoilWCont values as an input for the traditional CGM, the average narrow-sense heritability of the genotype-specific parameters (GSPs) improved by 0.05, with GNSlope, PreAnthRes, and VernSen showing the greatest improvements. Moreover, the root mean square of errors for grain number and yield was reduced by about 7% for CGM and 31% for CGM-WGP when using the sampled InitlSoilWCont values. Our results demonstrate the advantage of sampling missing environmental variables in CGM-WGP to improve prediction accuracy and increase the size of the reference population by enabling the utilisation of historical data that are missing environmental records.


Assuntos
Melhoramento Vegetal , Triticum , Triticum/genética , Genoma , Genômica/métodos , Genótipo , Fenótipo , Grão Comestível/genética , Modelos Genéticos
10.
J Chem Phys ; 160(13)2024 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-38573847

RESUMO

Intragenic translational heterogeneity describes the variation in translation at the level of transcripts for an individual gene. A factor that contributes to this source of variation is the mRNA structure. Both the composition of the thermodynamic ensemble, i.e., the stationary distribution of mRNA structures, and the switching dynamics between those play a role. The effect of the switching dynamics on intragenic translational heterogeneity remains poorly understood. We present a stochastic translation model that accounts for mRNA structure switching and is derived from a Markov model via approximate stochastic filtering. We assess the approximation on various timescales and provide a method to quantify how mRNA structure dynamics contributes to translational heterogeneity. With our approach, we allow quantitative information on mRNA switching from biophysical experiments or coarse-grain molecular dynamics simulations of mRNA structures to be included in gene regulatory chemical reaction network models without an increase in the number of species. Thereby, our model bridges a gap between mRNA structure kinetics and gene expression models, which we hope will further improve our understanding of gene regulatory networks and facilitate genetic circuit design.


Assuntos
Redes Reguladoras de Genes , Modelos Genéticos , RNA Mensageiro/genética , Processos Estocásticos
11.
BMC Bioinformatics ; 25(1): 144, 2024 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-38575890

RESUMO

BACKGROUND: Joint analysis of multiple phenotypes in studies of biological systems such as Genome-Wide Association Studies is critical to revealing the functional interactions between various traits and genetic variants, but growth of data in dimensionality has become a very challenging problem in the widespread use of joint analysis. To handle the excessiveness of variables, we consider the sliced inverse regression (SIR) method. Specifically, we propose a novel SIR-based association test that is robust and powerful in testing the association between multiple predictors and multiple outcomes. RESULTS: We conduct simulation studies in both low- and high-dimensional settings with various numbers of Single-Nucleotide Polymorphisms and consider the correlation structure of traits. Simulation results show that the proposed method outperforms the existing methods. We also successfully apply our method to the genetic association study of ADNI dataset. Both the simulation studies and real data analysis show that the SIR-based association test is valid and achieves a higher efficiency compared with its competitors. CONCLUSION: Several scenarios with low- and high-dimensional responses and genotypes are considered in this paper. Our SIR-based method controls the estimated type I error at the pre-specified level α .


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Genótipo , Simulação por Computador , Estudos de Associação Genética , Modelos Genéticos
12.
Proc Natl Acad Sci U S A ; 121(18): e2316302121, 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38657048

RESUMO

Bacteria are nonsexual organisms but are capable of exchanging DNA at diverse degrees through homologous recombination. Intriguingly, the rates of recombination vary immensely across lineages where some species have been described as purely clonal and others as "quasi-sexual." However, estimating recombination rates has proven a difficult endeavor and estimates often vary substantially across studies. It is unclear whether these variations reflect natural variations across populations or are due to differences in methodologies. Consequently, the impact of recombination on bacterial evolution has not been extensively evaluated and the evolution of recombination rate-as a trait-remains to be accurately described. Here, we developed an approach based on Approximate Bayesian Computation that integrates multiple signals of recombination to estimate recombination rates. We inferred the rate of recombination of 162 bacterial species and one archaeon and tested the robustness of our approach. Our results confirm that recombination rates vary drastically across bacteria; however, we found that recombination rate-as a trait-is conserved in several lineages but evolves rapidly in others. Although some traits are thought to be associated with recombination rate (e.g., GC-content), we found no clear association between genomic or phenotypic traits and recombination rate. Overall, our results provide an overview of recombination rate, its evolution, and its impact on bacterial evolution.


Assuntos
Bactérias , Teorema de Bayes , Evolução Molecular , Recombinação Homóloga , Bactérias/genética , Bactérias/classificação , Modelos Genéticos , Filogenia , Genoma Bacteriano , Recombinação Genética
13.
Theor Appl Genet ; 137(5): 104, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38622324

RESUMO

KEY MESSAGE: Selection response in truncation selection across multiple sets of candidates hinges on their post-selection proportions, which can deviate grossly from their initial proportions. For BLUPs, using a uniform threshold for all candidates maximizes the selection response, irrespective of differences in population parameters. Plant breeding programs typically involve multiple families from either the same or different populations, varying in means, genetic variances and prediction accuracy of BLUPs or BLUEs for true genetic values (TGVs) of candidates. We extend the classical breeder's equation for truncation selection from single to multiple sets of genotypes, indicating that the expected overall selection response ( Δ G Tot ) for TGVs depends on the selection response within individual sets and their post-selection proportions. For BLUEs, we show that maximizing Δ G Tot requires thresholds optimally tailored for each set, contingent on their population parameters. For BLUPs, we prove that Δ G Tot is maximized by applying a uniform threshold across all candidates from all sets. We provide explicit formulas for the origin of the selected candidates from different sets and show that their proportions before and after selection can differ substantially, especially for sets with inferior properties and low proportion. We discuss implications of these results for (a) optimum allocation of resources to training and prediction sets and (b) the need to counteract narrowing the genetic variation under genomic selection. For genomic selection of hybrids based on BLUPs of GCA of their parent lines, selecting distinct proportions in the two parent populations can be advantageous, if these differ substantially in the variance and/or prediction accuracy of GCA. Our study sheds light on the complex interplay of selection thresholds and population parameters for the selection response in plant breeding programs, offering insights into the effective resource management and prudent application of genomic selection for improved crop development.


Assuntos
Melhoramento Vegetal , Seleção Genética , Humanos , Melhoramento Vegetal/métodos , Genótipo , Plantas/genética , Genômica/métodos , Modelos Genéticos , Fenótipo
14.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38581421

RESUMO

Boolean models of gene regulatory networks (GRNs) have gained widespread traction as they can easily recapitulate cellular phenotypes via their attractor states. Their overall dynamics are embodied in a state transition graph (STG). Indeed, two Boolean networks (BNs) with the same network structure and attractors can have drastically different STGs depending on the type of Boolean functions (BFs) employed. Our objective here is to systematically delineate the effects of different classes of BFs on the structural features of the STG of reconstructed Boolean GRNs while keeping network structure and biological attractors fixed, and explore the characteristics of BFs that drive those features. Using $10$ reconstructed Boolean GRNs, we generate ensembles that differ in BFs and compute from their STGs the dynamics' rate of contraction or 'bushiness' and rate of 'convergence', quantified with measures inspired from cellular automata (CA) that are based on the garden-of-Eden (GoE) states. We find that biologically meaningful BFs lead to higher STG 'bushiness' and 'convergence' than random ones. Obtaining such 'global' measures gets computationally expensive with larger network sizes, stressing the need for feasible proxies. So we adapt Wuensche's $Z$-parameter in CA to BFs in BNs and provide four natural variants, which, along with the average sensitivity of BFs computed at the network level, comprise our descriptors of local dynamics and we find some of them to be good proxies for bushiness. Finally, we provide an excellent proxy for the 'convergence' based on computing transient lengths originating at random states rather than GoE states.


Assuntos
Algoritmos , Modelos Genéticos , Redes Reguladoras de Genes , Autômato Celular
15.
J Comput Biol ; 31(4): 294-311, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38621180

RESUMO

Whole Genome Duplications (WGDs) are events that double the content and structure of a genome. In some organisms, multiple WGD events have been observed while loss of genetic material is a typical occurrence following a WGD event. The requirement of classic rearrangement models that every genetic marker has to occur exactly two times in a given problem instance, therefore, poses a serious restriction in this context. The Double-Cut and Join (DCJ) model is a simple and powerful model for the analysis of large structural rearrangements. After being extended to the DCJ-Indel model, capable of handling gains and losses of genetic material, research has shifted in recent years toward enabling it to handle natural genomes, for which no assumption about the distribution of markers has to be made. The traditional theoretical framework for studying WGD events is the Genome Halving Problem (GHP). While the GHP is solved for the DCJ model for genomes without losses, there are currently no exact algorithms utilizing the DCJ-Indel model that are able to handle natural genomes. In this work, we present a general view on the DCJ-Indel model that we apply to derive an exact polynomial time and space solution for the GHP on genomes with at most two genes per family before generalizing the problem to an integer linear program solution for natural genomes.


Assuntos
Algoritmos , Genoma , Modelos Genéticos , Genoma/genética , Duplicação Gênica , Evolução Molecular
16.
J Comput Biol ; 31(4): 312-327, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38634854

RESUMO

Phylogenetic inference and reconstruction methods generate hypotheses on evolutionary history. Competing inference methods are frequently used, and the evaluation of the generated hypotheses is achieved using tree comparison costs. The Robinson-Foulds (RF) distance is a widely used cost to compare the topology of two trees, but this cost is sensitive to tree error and can overestimate tree differences. To overcome this limitation, a refined version of the RF distance called the Cluster Affinity (CA) distance was introduced. However, CA distances are symmetric and cannot compare different types of trees. These asymmetric comparisons occur when gene trees are compared with species trees, when disparate datasets are integrated into a supertree, or when tree comparison measures are used to infer a phylogenetic network. In this study, we introduce a relaxation of the original Affinity distance to compare heterogeneous trees called the asymmetric CA cost. We also develop a biologically interpretable cost, the Cluster Support cost that normalizes by cluster size across gene trees. The characteristics of these costs are similar to the symmetric CA cost. We describe efficient algorithms, derive the exact diameters, and use these to standardize the cost to be applicable in practice. These costs provide objective, fine-scale, and biologically interpretable values that can assess differences and similarities between phylogenetic trees.


Assuntos
Algoritmos , Filogenia , Análise por Conglomerados , Modelos Genéticos , Biologia Computacional/métodos , Evolução Molecular
17.
BMC Genomics ; 25(1): 349, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38589806

RESUMO

The fleece traits are important economic traits of goats. With the reduction of sequencing and genotyping cost and the improvement of related technologies, genomic selection for goats has become possible. The research collect pedigree, phenotype and genotype information of 2299 Inner Mongolia Cashmere goats (IMCGs) individuals. We estimate fixed effects, and compare the estimates of variance components, heritability and genomic predictive ability of fleece traits in IMCGs when using the pedigree based Best Linear Unbiased Prediction (ABLUP), Genomic BLUP (GBLUP) or single-step GBLUP (ssGBLUP). The fleece traits considered are cashmere production (CP), cashmere diameter (CD), cashmere length (CL) and fiber length (FL). It was found that year of production, sex, herd and individual ages had highly significant effects on the four fleece traits (P < 0.01). All of these factors should be considered when the genetic parameters of fleece traits in IMCGs are evaluated. The heritabilities of FL, CL, CP and CD with ABLUP, GBLUP and ssGBLUP methods were 0.26 ~ 0.31, 0.05 ~ 0.08, 0.15 ~ 0.20 and 0.22 ~ 0.28, respectively. Therefore, it can be inferred that the genetic progress of CL is relatively slow. The predictive ability of fleece traits in IMCGs with GBLUP (56.18% to 69.06%) and ssGBLUP methods (66.82% to 73.70%) was significantly higher than that of ABLUP (36.73% to 41.25%). For the ssGBLUP method is significantly (29% ~ 33%) higher than that with ABLUP, and which is slightly (4% ~ 14%) higher than that of GBLUP. The ssGBLUP will be as an superiors method for using genomic selection of fleece traits in Inner Mongolia Cashmere goats.


Assuntos
Genoma , Cabras , Humanos , Animais , Cabras/genética , Genômica/métodos , Fenótipo , Genótipo , Modelos Genéticos
18.
PeerJ ; 12: e17248, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38666077

RESUMO

Whereas undetected species contribute to estimation of species diversity, undetected alleles have not been used to estimated genetic diversity. Although random sampling guarantees unbiased estimation of allele frequency and genetic diversity measures, using undetected alleles may provide biased but more precise estimators useful for conservation. We newly devised kernel density estimation (KDE) for allele frequency including undetected alleles and tested it in estimation of allele frequency and nucleotide diversity using population generated by coalescent simulation as well as well as real population data. Contrary to expectations, nucleotide diversity estimated by KDE had worse bias and accuracy. Allele frequency estimated by KDE was also worse except when the sample size was small. These might be due to finity of population and/or the curse of dimensionality. In conclusion, KDE of allele frequency does not contribute to genetic diversity estimation.


Assuntos
Alelos , Frequência do Gene , Variação Genética , Variação Genética/genética , Humanos , Modelos Genéticos , Simulação por Computador , Genética Populacional/métodos
19.
PLoS One ; 19(4): e0300900, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38662751

RESUMO

Many questions in evolutionary biology require the specification of a phylogeny for downstream phylogenetic analyses. However, with the increasingly widespread availability of genomic data, phylogenetic studies are often confronted with conflicting signal in the form of genomic heterogeneity and incongruence between gene trees and the species tree. This raises the question of determining what data and phylogeny should be used in downstream analyses, and to what extent the choice of phylogeny (e.g., gene trees versus species trees) impacts the analyses and their outcomes. In this paper, we study this question in the realm of phylogenetic diversity indices, which provide ways to prioritize species for conservation based on their relative evolutionary isolation on a phylogeny, and are thus one example of downstream phylogenetic analyses. We use the Fair Proportion (FP) index, also known as the evolutionary distinctiveness score, and explore the variability in species rankings based on gene trees as compared to the species tree for several empirical data sets. Our results indicate that prioritization rankings among species vary greatly depending on the underlying phylogeny, suggesting that the choice of phylogeny is a major influence in assessing phylogenetic diversity in a conservation setting. While we use phylogenetic diversity conservation as an example, we suspect that other types of downstream phylogenetic analyses such as ancestral state reconstruction are similarly affected by genomic heterogeneity and incongruence. Our aim is thus to raise awareness of this issue and inspire new research on which evolutionary information (species trees, gene trees, or a combination of both) should form the basis for analyses in these settings.


Assuntos
Filogenia , Evolução Molecular , Animais , Modelos Genéticos
20.
Genet Sel Evol ; 56(1): 18, 2024 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-38459504

RESUMO

BACKGROUND: Validation by data truncation is a common practice in genetic evaluations because of the interest in predicting the genetic merit of a set of young selection candidates. Two of the most used validation methods in genetic evaluations use a single data partition: predictivity or predictive ability (correlation between pre-adjusted phenotypes and estimated breeding values (EBV) divided by the square root of the heritability) and the linear regression (LR) method (comparison of "early" and "late" EBV). Both methods compare predictions with the whole dataset and a partial dataset that is obtained by removing the information related to a set of validation individuals. EBV obtained with the partial dataset are compared against adjusted phenotypes for the predictivity or EBV obtained with the whole dataset in the LR method. Confidence intervals for predictivity and the LR method can be obtained by replicating the validation for different samples (or folds), or bootstrapping. Analytical confidence intervals would be beneficial to avoid running several validations and to test the quality of the bootstrap intervals. However, analytical confidence intervals are unavailable for predictivity and the LR method. RESULTS: We derived standard errors and Wald confidence intervals for the predictivity and statistics included in the LR method (bias, dispersion, ratio of accuracies, and reliability). The confidence intervals for the bias, dispersion, and reliability depend on the relationships and prediction error variances and covariances across the individuals in the validation set. We developed approximations for large datasets that only need the reliabilities of the individuals in the validation set. The confidence intervals for the ratio of accuracies and predictivity were obtained through the Fisher transformation. We show the adequacy of both the analytical and approximated analytical confidence intervals and compare them versus bootstrap confidence intervals using two simulated examples. The analytical confidence intervals were closer to the simulated ones for both examples. Bootstrap confidence intervals tend to be narrower than the simulated ones. The approximated analytical confidence intervals were similar to those obtained by bootstrapping. CONCLUSIONS: Estimating the sampling variation of predictivity and the statistics in the LR method without replication or bootstrap is possible for any dataset with the formulas presented in this study.


Assuntos
Genômica , Modelos Genéticos , Humanos , Genótipo , Reprodutibilidade dos Testes , Intervalos de Confiança , Linhagem , Genômica/métodos , Fenótipo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...